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Abstract: I present a critical review of techniques for estimating confidence intervals on binomial 
population proportions inferred from success counts in small-to-intermediate samples. Population pro- 
portions arise frequently as quantities of interest in astronomical research; for instance, in studies 
aiming to constrain the bar fraction, AGN fraction, SMBH fraction, merger fraction, or red sequence 
fraction from counts of galaxies exhibiting distinct morphological features or stellar populations. How- 
ever, two of the most widely-used techniques for estimating binomial confidence intervals — the 'normal 
approximation' and the Clopper & Pearson approach — are liable to misrepresent the degree of sta- 
tistical uncertainty present under sampling conditions routinely encountered in astronomical surveys, 
leading to an ineffective use of the experimental data (and, worse, an inefficient use of the resources 
expended in obtaining that data). Hence, I provide here an overview of the fundamentals of binomial 
statistics with two principal aims: (i) to reveal the ease with which (Bayesian) binomial confidence in- 
tervals with more satisfactory behaviour may be estimated from the quantiles of the beta distribution 
using modern mathematical software packages (e.g. R, MATLAB, mathematica, IDL, python); and 
(il) to demonstrate convincingly the major flaws of both the 'normal approximation' and the Clopper 
& Pearson approach for error estimation. 
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1 Introduction 

One problem frequently encountered in astronomical 
research is that of estimating a confidence interval 
(CI) on the value of an unknown population propor- 
tion based on the observed number of success counts in 
a given sample. The unknown population proportion 
may be, for instance, the intrinsic fraction of barred 
disk galaxies at a specific epoch to be inferred from the 
observed num ber of barred disks in a volume- limited 
samp l e (e.g. [f Clmegrce n et all ll99(J: Iv an den Bergbl 
120021 : ICameron et all l20ld : iNair fe Abraham! |20ld ). 
with the corresponding binomial CI used to eval- 
uate the hypothesis that the bar fraction changes 
with redshift relativ e to a local benchmark (e.g. 
ICameron et al.l l2010h . Experiments to investigate 
the role of mass and environment in quenching star- 
formation via measurement of the galaxy red sequence 
fraction (e.g. iBaldrv et all 120061 : iHester et all 120101 : 
lllbert et alJl2010D . or to investigate whether or not 
major mergers were more frequent at high redshift 
via meas urement of the close-pair/asymmetric frac- 
tion Ce.g.lDe Propris et al.ll2005l : IConselice et al.ll2008l : 
lLopez-Saniuan et al.ll2oiu i. also routinely present this 
class of problem. 

However, the two most commonly used meth- 
ods for estimating CIs on binomial population 
proportions — the ' n ormal approximation' and the 
IClopper fc Pearson! ll 19341 ) approach — exhibit signif- 
icant flaws under routine sampling conditions (cf. 



IVollsetlll993l : ISantnenll99Sl : iBrown et al ] 120011 . 120021 ). 

In particular, the 'normal approximation' (also called 
the 'Poisson error') may systematically under-estimate 
the CI width necessary to provide coverage at the de- 
sired level, especially for small samples, but even for 
rather large samples when the true population propor- 
tion is either very low or very high. If used naively 
the 'normal approximation' has the potential to mis- 
lead one into over-stating the significance of one's in- 
ferences concerning the physical system under study 
formulated on the basis of the observed data. 

Astronomers aware of these flaws in the 'nor- 
mal approximation ' ofte n adopt the alternative 
IClopper fc Pearson! (|1934T ) approach to CI estima- 
tion b y way of reference t o the CI tables in IGehrelsl 
(| 19861 ). Unfortunately, the lClopper fc Pearson! (|l934T ) 
approach suffers from the opposite problem to that 
of the 'normal approximation' — namely, a systematic 
over-estimation of the CI width required to p r ovide the 
desired covera ge (IClopper fc Pearsonl Il934l : iNevmanl 
119351 : IGehrelsl 1 19861 : lAgresti fc Coul1ll998h . In sci- 
entific research this over-estimation of the statistical 
measurement uncertainties may mislead one into plac- 
ing insufficient confidence in the experimental out- 
comes, resulting in an inefficient use of the measured 
data (and, hence, the resources expended in obtain- 
ing that data). Indee d, it has been well argued by 
lAgresti fc Cordl ||1998T) that in many practical appli- 
cations even the 'normal approximation', despite it s 
flaws, is preferable to the IClopper fc Pearson! (|1934T ) 
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approach. 

Fortunately, there exist a multitude of alterna- 
tive methods for generating CIs on binomial popula- 
tion proportions, many of which exhibit far more sat- 
isfactory beha viour than either the 'nor mal approxi- 
mat ion' or the[Clopper fc Pearson! (I1934T) approach— 
see lAgresti fc Coulll (| 19981 ) and iBrown et all (|200ll ) 
for various examples. Here I review both the the- 
ory and application of one of these methods — use 
of the beta distribution quantiles — deriving from a 
simple Bayesian analysis in which a uniform ('non- 
informative') pr ior is adopted for t he true population 
proportion fe.g. lGelman et al. 2003). As I will demon- 
strate, the beta distribution generator for binomial CIs 
is both theoretically well-motivated and easily applied 
in practice using widely available mathematical soft- 
ware packages (e.g. R, matlab, mathematica, IDL, 
python). Ultimately, I advocate strongly that this 
strategy for estimating binomial CIs be adopted in fu- 
ture studies aiming to constrain fundamental popu- 
lation proportions in astronomical research (e.g. the 
galaxy bar fraction, red sequence fraction, or merger 
fraction) — especially for samples intrinsically of small- 
to-intermediate size, or when the subdivision of larger 
samples for analytical purposes produces sparsely pop- 
ulated data bins. 

2 The Binomial Distribution 

In probability theory any experiment for which there 
are only two possible random outcomes — success, oc- 
curring with probability, p, or failure, occurring with 
probability, q = (1 — p) — is referred to as a Bernoulli 
trial. Examples of Bernoulli trials in astronomical re- 
search may include asking whether or not a randomly 
sampled galaxy is barred, red-sequence, or merging. 
The probability, P, of observing a particular number 
of successes, k, in a series of n independent Bernoulli 
trials (with common success probability, p) is governed 
by the binomial probability functiorQ: 

P(k,n,p)= (ljp k q n ~ k (1) 
where O^fc^n, fc e Z (an integer), and 

\k) = k\{n-k)\ 



1 One may note that the correct terminology in a statis- 
tical context is actually 'binomial probability mass func- 
tion', owing to the discrete nature of the binomial distri- 
bution, i.e., that there exist a finite number of possible k 
values (the integers from to n, inclusive) to which non- 
zero probabilities may be assigned. (As distinct from the 
alternative case of a 'probability density function', such as 
the Bayesian posterior probability distribution for p consid- 
ered in Section[3] for which non-zero probabilities may only 
be assigned to measurable intervals on the real number line, 
and not individual — or even countable sets of — real num- 
bers.) Nevertheless, to avoid any confusion with the more 
commonly-used definition of the term, 'mass function', in 
astronomy I adopt the shorter expression, 'binomial prob- 
ability function', herein. 



(see, for example, Quirin 197^). Note that the prob- 
abilities given by the n + 1 possible values of k cor- 
respond to the n + 1 terms of the binomial expansion 
of (p + q) n . The number of barred systems counted 
in a given sample of disk galaxies is a classic exam- 
ple of a binomially-distributed variable in astronomy. 
The corresponding expectation value for the number of 
successes is 2fc=o ^ x Pfo, n iP) — n P with a variance 
of YjI=o x P{k, n,p) = npq. Moreover, the expecta- 
tion value for the fraction, k/n, of successes is equal to 
the Bernoulli trial success probability (also referred to 
as the 'underlying population proportion'), p, and its 
variance is pq/n. 



An intermission: Just what is a confi- 
den ce interval ? A s expl a ined eloquently by both 
iKraft et alJ (|l99ll 1 and lRossI l|2003h . there is a funda- 
mental difference between the 'classical' and 'Bayesian' 
definitions of the term 'confidence interval'. In clas- 
sical statistical theory a binomial CI is defined as a 
pair of random variables, P; and P u , (with each ran- 
dom variable nec essarily a finite, re al-valued, measur- 
able function; cf. iRao fc Swift]|2006l ) operating on the 
set of all possible experimental outcomes, 9 = {k : 
^ k ^ n, k e Z}, such that if the experiment were 
to be repeated by a sufficiently large number of in- 
dependent observers then the fraction of observers for 
whom the true value of the underlying population pro- 
portion is covered by their realisation of these random 
variables — i.e., for whom Pi(9i) = pi < p < p u = 
Pu{9i) — is guaranteed to converge to (at least) a spe- 
cific value, c, termed 'the confidence level'. In the 
Bayesian paradigm, on the other hand, the underlying 
population proportion is treated as an unknown model 
parameter and the binomial CI defined as an interval, 
(pi, p u ), to which the experimenter believes may be as- 
signed a probability, c, of containing the true value of p, 
based upon consideration of the likelihood function for 
p given the experimental data and the strength of any a 
priori beliefs/expectations regarding the system under 
study. (Indeed, acknowledging the significant concep- 
tual differences between these alternative approachs 
to the binomial CI, the term 'credible interval' is of- 
ten used instead in Bayesian analysis to avoid confu- 
sion with the classica l nome nclature.) Importantly, as 
noted bv lKraft et aD (|l99ll ). regardless of one's philo- 
sophical position regarding these two statistical sys- 
tems, "the Bayesian definition of confidence intervals 
reflects common astronomical usage better than the 
classical definition". 

Of the three binomial CI gen erators discussed in 
this re view only that attributed to lClopper fc Pearsonl 
(1 19341 ) is consistent with the classical definition for all 
possible values of the underlying population propor- 
tion and sample size. However, I will argue that (at 
least) in the case of the binomial distribution Bayesian 
CIs provide generally more satisfactory behaviour for 
astronomical purposes than their classical counter- 
parts, even when evaluated against a performance di- 
agnostic based on the classical definition — namely, the 
coverage fraction (or 'effective coverage') at given p 
and n. 
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Figure 1: Example likelihood functions for the true value of the underlying population proportion, p, given 
five 'measured' success fractions, p = k/n, for samples of sizes n = 6 (left panel) and n = 36 (right panel). 
In each case the shape of the curve is given by the beta distribution with shape parameters as specified by 
Equation [5J The asymmetric nature of this likelihood function in the small sample size regime is clearly 
evident amongst the n = 6 examples, as is its convergence in the intermediate-to-large sample size regime 
towards a narrower, more symmetric, (pseudo-)normal distribution amongst the n = 36 examples. 



3 The Beta Distribution Gen- 
erator for Binomial Confi- 
dence Intervals 

In astronomical data analysis it is standard practice 
to adopt the measured success fraction (also referred 
to as the 'observed population proportion'), p = k/n, 
as one's 'best guess' of the underlying population pro- 
portion. In statistical terms, p is employed as a point 
estimator for p. The likelihood of observing the re- 
sult, p = k/n, for a given value of p is, of course, pro- 
portional to p k q n ~ k . Normalisation of this likelihood 
function over < p < 1 defines a 'beta distribution' 
with integer parameters, a = k + 1 and b = n — k + 1: 

o/ ia {a + b- 1)! „„! (,„! 
g(a ' b)= (a -l)!(6-l)! P q (2) 

where q = l-p (e.g. iGelman et al.ll2003l : lRosdl2003h . 
Differentiation of this likelihood function reveals that 
our best guess, p, is in fact the maximum likelihood 
estimator of p[j The characteristic shape of the (beta 
distribution) likelihood function for p is illustrated in 
Figure [T] at a variety of 'measured' success fractions 

2 Technically, when p — (or 1) the likelihood function 
for p has no zero first derivative on the open interval, (0, 1), 
although the function itself is indeed strictly increasing as 
p — > (or 1). In this case one may choose to adopt the 
median (50% quantile) of the (beta distribution) likelihood 
function as one's best guess for p, or else to compute a 
'one-sided' confidence interval bounding p instead. In either 
case, one proceeds along similar principles. 



for samples of sizes n = 6 (left panel) and n = 36 
(right panel). At small n, the likelihood function for 
p is markedly asymmetric (except where p = 1/2), 
but at intermediate n it is visibly converging towards 
a narrow, symmetric, (pseudo-) normal distribution — 
the motivation behind the 'normal approximation' dis- 
cussed in Section 31 

Given no a priori knowledge to inform one's ex- 
pectations regarding the experimental outcome one 
may suppose that all values of p are equally 'proba- 
ble'. Formally, this condition is characterised via the 
Bayes-Laplace uniform prior, for which P pr ior(p) = 1 
over < p < 1. Application of Bayes' theorem un- 
der this assumption allows one to treat the normalised 
likelihood function for p as a posterior probability dis- 
tribution. Thus, the quantiles of the beta distribu- 
tion from Equation [2] may be used directly to estimate 
(Bayesian) confidence intervals on the underlying pop- 
ulation proportion given the observed data[j Specifi- 
cally, the lower and upper bounds, p; and p u , defining 
an 'equal-tailed' (or 'central') interval for p at a nom- 
inal confidence level of c = 1 — a are given by the 
quantiles: 

J "Pi c 1 
B(a,b)dp = a/2 and B(a,b)dp = a/2 . (3) 
o Jp u 

3 A stronomers familiar with the work of Burgasscr et al. 
(2003) on binarity in brown dwarfs may be familiar with 
the algorithm for recovering confidence intervals on p given 
in their Appendix, which is in fact equivalent to the 
Bayesi an approach with unif orm prior presented here (al- 
though Burgasser et al. 2Q0j| make no explicit reference to 
either Bayes' theorem or the beta distribution). 
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Note that the bounds of this 'equal-tailed' interval 
(which partition the probability of p greater than p u 
equal to that of p less than pi) will be necessarily asym- 
metric about the maximum likelihood value, p, (except 
at p = 1/2) owing to the asymmetric nature of the 
(beta distribution) likelihood function for p (shown in 
Figure [T}. As I will demonstrate below, binomial CIs 
generated in this manner have one rather desirable 
property, not shared by either th e 'nor mal approxi- 
mation' or the IClopper fc Pearsonl (11934T ) approach — 
namely, their mean effective coverage is consistently 
very close to the nominal confidence level, even in the 
small sample size regime. 

In the upper panel of Figure [2] I examine first the 
effective coverage, c e , of 'equal-tailed' binomial CIs de- 
fined via the beta distribution for a range of population 
proportions and sample sizes (0.025 ^ p ^ 0.975 and 
1 ^ n < 100) at a nominal level of c n 0.683 (la) — 
with the effective coverage defined as the fraction of 
samples drawn from the binomial probability function 
with given p and n for which the corresponding re- 
alisation of the CI under investigation encompasses 
the true population proportion. Thus, the effective 
coverage fractions, c e , presented here are computed 
as the sum of all binomial probabilities, P(k,n,p) 
over {k : ^ k n, k e Z}, for which the triad, 
{k,n,p}, produces a confidence interval, (pi,p u ), con- 
taining (covering) p. 

One of the most striking features of this plot is 
the remarkable sensitivity of the effective coverage to 
the true underlying population proportion and sample 
size. This so-called 'oscillation signature' is an inherent 
property of all deterministic (i.e., non-randomising) 
generators for binomial CIs, arising from the discrete- 
ness of the binomial distributiony Despite these os- 
cillations it is clear that the beta distribution CIs 
do achieve an effective coverage close to (or slightly 
greater than) the desired confidence level over the vast 
majority of the parameter space explored here. In- 
deed, even at the extremes of p < 1/6 and p > 5/6, 
where the oscillations are initially rather large, there 
is evidently a rapid increase in coverage stability with 
increasing sample size, such that the 'oscillation sig- 
nature' is vastly suppressed by n > 40, and effectively 



' iBrown et alj (l2001h describe the 'oscillation signature' 
as the challenge of 'lucky p, lucky n' — namely that for cer- 
tain ('lucky') combinations of underlying population pro- 
portion and sample size there exist two almost equally 
likely p values closely straddling the true p. For instance, 
if p — 1/5 and one has a sample of size n — 3 the possible 
p values are 0, 1/3, 2/3, and 1, occurring with frequen- 
cies 0.512, 0.384, 0.096, and 0.008, respectively. Tailoring a 
binomial CI specifically to this situation, one could define 
pi — p — 2/15 — e and p u — p + 1/5 + e (with e an arbitrar- 
ily small constant necessary to ensure p is contained within 
the open interval, (pi,p u ), for k — and 1), returning an 
effective coverage of c e = 0.512 + 0.384 = 0.896. However, 
applying the same CI generator to a system with p = 1/3 
(and again n — 3) for which the possible p values occur 
with frequencies 0.296, 0.444, 0.222, and 0.037 (rounded to 
3 decimal places), one obtains an effective coverage of only 
c e = 0.444! For further discussion of the impact of the 'os- 
cillation si gnature' on binomial C Is the interested reader is 
referr ed to lAgresti fc Coulll lH998f) and lBrown et al.l ||2001| . 
[20011 . 



eliminated (at least for 0.025 ^ p ^ 0.975) by n > 80 
(unlike in the case of the 'normal approximation' ex- 
amined in Section [4]). 

In the lower panel of Figure [2] I examine the corre- 
sponding mean effective coverage (averaged uniformly 
over 0.025 p ^ 0.975) as a function of sample size. 
Whereas the effective coverage at given p and n shown 
in the upper panel is consistent with the classical no- 
tion of confidence interval performance, the mean ef- 
fective coverage may be considered a 'Bayesian' CI 
performance diagnostic — i.e., if one really does hold 
all p values equally 'probable' a priori then one's 
favoured CI generator should be at least expected to 
provide coverage consistent with the nominal level in 
the longterm average of all equivalent experiments. In- 
spection of the lower panel of Figure [5] confirms a very 
close agreement between the mean effective coverage 
of the beta distribution CI generator and the nominal 
confidence level, independent of n. 

Most modern mathematical software packages pro- 
vide robust, easy-to-use library functions for comput- 
ing beta distribution quantiles (e.g. the QBETA rou- 
tine in R; the Quantile and BetaDistribution 
commands in mathematica; the betaincinv func- 
tion in MATLAB; the ibeta function in IDL; or the 
dist.beta.ppf function in python). Explicit code 
fragments demonstrating the implementation of these 
commands are provided in the Appendix to this pa- 
per, and I advocate strongly the use of these recipes 
for the computation of confidence intervals on binomial 
population proportions in future astronomical studies. 
In Tables [1] and [2] in the Appendix I present compila- 
tions of 'equal-tailed' CIs generated in this manner at 
nominal confidence levels of lcr and 3(J, respectively, 
for all possible observed success counts in sample sizes 
up to n = 20. These tables are intended both as a 
convenient reference for use directly in studies involv- 
ing samples of 20 objects or less, and as a benchmark 
against which to confirm the correct implementation 
of the beta distribution CI generator for users newly 
adopting this technique. 

A note on the prior The (non-informative) 
Bayes-Laplace uniform prior may in fact be viewed 
as the special case of P pr io r (p) = 5(1,1) = 1 within 
a wider family of possible conjugate priors for the bi- 
nomial population proportion based on the beta dis- 
tribution. Another popular non-informative prior for 
y is the Jeffreys' prior of P m i or (p) = BJ l/2, 1/20 (cf. 
iBrown et all l200ll : ICelman et al1l2003h . which is, by 
design, proportional to the square root of the Fisher in- 
formation. Application of the Jeffreys' prior returns a 
posterior probability distribution for p of B(k + l/2, n — 
k + 1/2). The performance of binomial CIs generated 
via beta distribution quantiles based on the Jeffreys' 
prior differ insignificantly from those based on the uni- 
form prior over 0.025 ^ p ^ 0.975 when n > 2 — 
consistent with the description of both these priors as 
'non-informative' (i.e., that even for small sample sizes 

5 Note that the factorial functions used in the beta distri- 
bution definition of Equation[2]must be replaced by gamma 
functions according to the relation, (m)\ = T(m + 1), in or- 
der to handle the non-integer input in this case. 
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Figure 2: The effective coverage, c e , of confidence intervals on the binomial population proportion gen- 
erated from quantiles of the beta distribution at a nominal level of c„ * 0.683 (ltr) over the range 
0.025 ^ p ^ 0.975 and 1 ^ n ^ 100 (upper panel). Averaging the measured c e values uniformly over all p 
at each n returns the mean effective coverage as a function of sample size (lower panel). 



6 



Publications of the Astronomical Society of Australia 




0.0 0.2 0.4 0.6 0.8 1.0 
Observed Population Proportion (p) 



r 

0.0 0.2 0.4 0.6 0.8 1.0 
Observed Population Proportion (p) 



Figure 3: Comparison between the true binomial distribution of the p statistic (i.e., the observed popula- 
tion proportion, k/n) and that assumed by the 'normal approximation'. Specifically, the p distributions 
of the binomial probability function with p = 1/3, 1/2, and 5/6 are contrasted against scaled normal dis- 
tributions with matching means and variances for sample sizes of n = 6 (left panel) and n = 36 (right 
panel), respectively. In the small sample size regime the 'normal approximation' provides a reasonable 
representation of the p distribution at p = 1/2 and 1/3, but not 5/6, while in the intermediate-to-large 
sample size regime even the distribution at p = 5/6 is also clearly converging towards normal. 



the shape of the posterior probability distribution in 
both cases is strongly governed by the likelihood func- 
tion of the observed data). (See the recent review by 
ICousins. Hvmes. fc Tucker!l2009l for a thorough evalu- 
ation of the performance of Bayesian CIs constructed 
with the Jeffreys' prior.) Hence, whilst the specific re- 
sults presented in this paper are computed exclusively 
using the uniform prior, for the purposes of our general 
discussion regarding the superiority of the beta distri- 
bution quantile technique over the ' norma l approxi- 
mation' and the IClopper fc Pearsonl j 19341) approach 
these two non-informative priors may be considered 
interchangable. 



4 The 'Normal Approxima- 
tion' 

For a system with an underlying binomial population 
proportion, p, neither very close to or 1, one may 
suppose (with reference to the Central Limit Theo- 
rem) that the distribution of the p statistic in a series 
of independent samples of a fixed 'large' size will follow 
approximately a normal distribution. Under the as- 
sumptions of this 'normal approximation' (also called 
the 'Poisson error') one may em ploy the standard 
' Wald test' criterion, established bv lWald fc Wolfowit j 
(| 19391 ). to construct a two-sided confidence interval for 
p. Specifically, at a confidence level of c = 1 — a one 
may expect that the true value of p lies within the 



interval: 



< p < p + Zl_ Q /2-\ 



(4) 



where q = 1 — p, and z 1 _ a / 2 is defined with reference 
to the standard normal distribution: 



I 



z 1-q/2 



exp (—a; /2)dx ■■ 



■ a/2 



Values of zi_ a /2 for particular confidence levels 
may be obtained from reference tables in statisti- 
cal textbooks (e.g. Quirin 1978) or computed within 
one's favourite mathematical software package (e.g. 
the QNORM function in R). Of course, the most com- 
monly used formula for constructing error bars on 
measured galaxy bar fractions, p = p + \Jpq/n (e.g. 
lElmeereen et al] Il990l ). is simply the application of 
Equation[4]at Zi_ Q /2 = 1, corresponding to a la confi- 
dence level of c 0.683. The cases of z 1 _ a / 2 = 2 and 3 
(i.e., 2a and 3a errors) correspond to higher confidence 
levels of c 0.954 and 0.997, respectively. 

As noted above, the key assumption behind this 
approach to binomial CI estimation — that the distri- 
bution of p may be approximated via a normal distri- 
bution with mean, p, and variance, pq/n — is reasonable 
only under the conditions of a 'large' sample size and p 
neither very close to or 1. In Figure [3] I compare the 
distribution of the p statistic (computed directly from 
the binomial probability function) against the shape 
of the corresponding 'normal approximation' for three 
different values of the underlying population propor- 
tion (p = 1/3, 1/2, and 5/6) and two different sample 
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sizes (n = 6 and 36) . In the small sample size example 
(n = 6) the 'normal approximation' provides a reason- 
able representation of the p distribution at p = 1/3 
and p = 1/2, but performs poorly at p = 5/6 (i.e., p 
close to 1). However, in the intermediate sample size 
example (n = 36) there is now a clear convergence 
towards a normal distribution in p even at p = 5/6. 
These examples presented in Figure serve to illus- 
trate the nature of deviations from 'normality' in the 
distribution of p at small n and/or extreme p values. 
The impact of these deviations on the performance of 
the 'normal approximation' as a binomial CI generator 
is examined below. 

In Figure [4] I present the effective coverage of bi- 
nomial CIs estimated via the 'normal approximation' 
as a function of p and n at a nominal confidence level 
of c n 0.683 (la). As in the case of the beta dis- 
tribution quantile approach described above, there is 
a clear 'oscillation signature' visible in this figure, re- 
flecting a marked sensitivity in the coverage perfor- 
mance to the value of the underlying population pro- 
portion and sample size0 However, it is also evident 
that the 'normal approximation' suffers a systematic 
decline in performance both for small n and towards 
extreme values of p near or 1, generating binomial 
CIs with effective coverage far below the desired level. 
The strict symmetry of the 'normal approximation' CI 
about the observed success fraction — which at low or 
high p may even extend (unphysically) to p or 
p ^ 1 — regardless of the inherent asymmetry in the 
likelihood distribution for p (see Figure [TJ is the prin- 
cipal cause of these coverage failures. The poor perfor- 
mance of the 'normal approximation' at small n is fur- 
ther highlighted in the corresponding plot of mean ef- 
fective coverage against sample size shown in the lower 
panel of Figure [3] For the la CIs examined here (and 
popularly adopted in studies of the galaxy bar fraction) 
the mean effective coverage of the 'normal approxima- 
tion' is far below the nominal level for n < 20, and 
should thus be strictly avoided in this small sample 
size regime. Indeed, although its mean effective cov- 
erage does ultimately improve with increasing n one 
may be well advised to avoid the 'normal approxima- 
tion' altogether in light of its poor effective coverage 
at extreme p values and the ready availability of a su- 
perior CI generator in the form of the (Bayesian) beta 
distribution quantiles described in Section 

The flaws in the 'normal approximation' as a CI 
generator described above were a great source of con- 



°It is important also to note that this 'oscillation sig- 
nature' is evident even in binomial CIs generated via the 
'normal approximation' at very large s ample sizes , as thor- 
oughly demonstrated by I Brown et ali (2001, 2003). F° r in- 
stance, I Brown et alj d200lT) describe the erratic behaviour 
of the 'normal approximation' coverage at a nominal level 
of c n = 0.95 for a system with p = 0.005, whereby there 
is a steady convergence in c e towards 0.95 for n increas- 
ing until n — 592, at wh ich point coverage falls suddenly 
to c e = 0.792! Similarly. ferown et all d2002fl demonstrate 
that in order to ensure coverage stays at or above a nom- 
inal level of c n = 0.93 for a system with p = 0.1 using 
the 'normal approximation' one requires a sample size of 
at least n = 286, whereas for the Bayesian (Jeffreys' non- 
informative prior) CI this criterion is satisfied by n — 47. 



cern for statisticians in the 1930s, prompting the 
search for alternatives able to ensure universal cov- 
erage of at least the nominal level (thereby satisfying 
the classical definition of the term, 'confidence inter- 
val'), whilst remaining readily computable given the 
limited aids available at the time (such as reference 
tables of quantiles for standard distributions). The 
mos t popular of all su c h pro posed alternati ves was 
the I Clopper fc Pearson! (| 19341 1 approach (cf. iGehrelsl 
1986), which I review below. 

5 The Clopper &; Pearson 
Approach 

In their landmark 1934 paper Clopper & Pearson pre- 
sented a direct method for constructing 'classical' con- 
fidence intervals on inferred population proportions 
based on quantiles of the binomial probability function 
(Equation [T]), guaranteed to provide a coverage prob- 
ability of at least (but likely exc eeding) the nominal 
confid ence level. The 'two-sided' IClopper fc Pearson! 
(| 19341 ) CI at c = 1 — a is constructed by solving the 
following equations for the lower and upper bounds, 
Pi(k) = pi and P u (k) = p u : 

% " Pl)n ~ 3 = Q/2 (for k * 0) (5) 

and 

J (^Jpi(l-Pu) n - J = a/2 (for k * n) (6) 

where k is again the observed number of successes (e.g. 
barred galaxies) in the sample, and n the total sample 
size. Note that in t h e ext reme cases of p = or 1, the 
IClopper~fc Pearson (|l934T l formulae reduce simply to: 

Pl = (a/2) 1/n for p = 1 and (7) 

p u = 1 - (a/2) 1/n for p = . (8) 

Modern mathematical software packages, such as 
R and matlab, support easy-to-use library functions 
(cf. BINOM.test in the STATS package in R; or binofit 
in the S TATISTICS toolbox in ma tlab) for compu- 
tation of IClopper fc Pear son (1934) confidence limits, 
which employ robust algorithms for solution of Equa- 
tions [5] and [6] Alternatively, there exist numerous 
refere nce tables of pre-computed binomial CIs based 
on the lClopper fc Pearsonl (|l934h approach — most no- 
tably those of lGehrelsl i|l986). a popular reference for 
estimating uncertainties in astronomical population 
proportions. 

In the upper panel of Figure [5] I examine 
the effective cove r age o f CIs generated via the 
IClopper fc Pearsonl (|1934D approach as a function of 
p and n at a nominal confidence level of c as 0.683 
(la). In contrast with the results for both the beta 
distributi on and the 'normal approx imation' reviewed 
above the lClopper fc Pearsonl l|l934h CIs provide cov- 
erage far exceeding the nominal confidence level over 
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Figure 4: The effective coverage, c e , of confidence intervals on the binomial population proportion gener- 
ated via the 'normal approximation' at a nominal level of c„ sa 0.683 (la) over the range 0.025 ^ p ^ 0.975 
and 1 ^ n ^ 100 (upper panel). Averaging the measured c e values unformly over all p at each n returns 
the mean effective coverage as a function of sample size (lower panel). 
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Figure 5: The effective coverage, Ce, of c onfidence intervals on the binomial population proportion gen- 
erated via the IClopper &; Pearson! ( 1934I) approach at a nominal level of c„ « 0.683 (la) over the range 
0.025 ^ p ^ 0.975 and 1 ^ n ^ 100 (upper panel). Averaging the measured c e values unformly over all p 
at each n returns the mean effective coverage as a function of sample size (lower panel). 
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Figure 6: Comparison between the mean widths o f binomial CIs generated a t c «s 0.683 (la) via the beta 
distribution, the 'normal approximation', and the lClopper fc Pearson! (J1934) approach, respectively, as a 
function of the underlying population proportion, p, for samples of sizes n = 6 (left panel) and n = 36 
(right panel). 



much of this parameter space. The lClopper fc Pearsonl 
(1934) coverage excess is also clearly evident in the cor- 
responding mean effective coverage for this CI gener- 
ator plotted as a function of sam ple size in the lower 
panel of Figure [S] Although the IClopper fc Pearsonl 
(1934) CIs do eventually converge to the nominal level 
at very large n, in the small-to-intermediate sample 
size regime their mean effective coverage is consis- 
tently far above the desir e d leve l. This point is in fact 
acknowledged in [Gehrcis (1986), although it appears 
not to be widely appreciated considering the frequency 
with which these CIs are treated as a 'gold standard' 
in astronomical papers. 

6 Mean Confidence Interval 
Widths 

To illustrate the influence of the choice of CI gener- 
ator on the estimated magnitude of the relevant un- 
certainties (i.e., the error bar size), I compare in Fig- 
ure [|J] the mean widths of c x 0.683 (1<t) CIs esti- 
mated via the ('equal-tailed') beta distribution quan- 
tile technique, the ' norm al approximation', and the 
IClopper fc Pearsonl (|1934T ) approach as a function of 
p for samples of sizes n = 6 (left panel) and n = 36 
(right panel). In the small sample size regime (where 
the 'normal approximation' fails to provide sufficient 
coverage at p < 1/6 and p > 5/6; see Figure [4| the 
mean CI widths are markedly smaller (by as much 
as Ap ~ —0.15) than those derived using the beta 
distribution technique (which generally provides supe- 
rior coverage at these p values; see Figure (Of 
course, the beta distribution should not be viewed as 
a strict benchmark for the ideal CI width, since its 



coverage is indeed prone to erratic performance at cer- 
tain p values — the 'oscillation signature' to which all 
non-randomising binomial CI generators are prone; al- 
though, as we have argued above, its performance may 
be considered the best of the three approaches exam- 
ined in this study.) In the intermediate sample size 
regime, the mean widths of these these two CI gen- 
erators are in much better agreement, except at the 
extremes of p < 1/20 and p > 19/20 where a marked 
under-estimation is still evident in the 'norma l approx- 
imation' CIs. The IClopper fc Pearsonl l|l934h CIs, on 
the other hand, exhibit a much greater mean width 
than those of the beta distribution or 'normal approxi- 
mation', regardless of p — reflecting the substantial cov- 
erage excess demonstrated for this CI generator in Sec- 
tion [5] (see Figure [5]). These examples verify that the 
choice of CI generator can indeed have a substantial 
impact on the magnitude of the estimated uncertain- 
ties, thereby confirming this choice to be an important 
practical consideration for effective astronomical data 
analysis. 



7 Conclusions 

I have reviewed the performance of three alternative 
methods for estimating confidence intervals on bino- 
mial population proportions; namely, the beta dis- 
tribution qua ntile technique, the ' norm al approxima- 
tion', and thelClopper fc Pearsonl (|1934| ) approach (cf. 
Gehrcis 1986). Despite their current popularity in as- 
tronomical research, the latter two CI generators are 
demonstrated to perform poorly under sampling condi- 
tions routinely encountered in observational studies — 
with the 'normal approximation' failing to provide 
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CIs of sufficient width to achie ve coverage at the 
nomin al confidence level, and the lClopper fc Pearson! 
(| 19341 ) approach producing CIs far wider than neces- 
sary to achieve the nominal coverage. In contrast, the 
(Bayesian) beta distribution quantile technique, is re- 
vealed to be a well-motivated alternative, consistently 
providing a mean level of coverage close to the nomi- 
nal level, even for small-to-intermediate sample sizes. 
Given that the beta distribution generator for binomial 
CIs may be easily implemented using modern math- 
ematical software packages, I advocate strongly that 
this technique be adopted in future studies aiming to 
constrain the true values of astronomical propulation 
proportions (e.g. the galaxy bar fraction, red sequence 
fraction, or merger fraction). 

A CI Code Fragments & CI 
Reference Tables 

Here I provide simple code fragments demonstrating 
the implementation of the beta distribution CI genera- 
tor via standard library routines in R, matlab, math- 
ematica, IDL, and python. The correct performance 
of these code fragments in one's preferred mathemat- 
ical software package may be verified by comparison 
against the reference tables of binomial CIs presented 
here in Tables [1] and [2] As in the main body of this 
paper I denote the nominal confidence level, c, the ob- 
served success count, k, and the sample size, n. In the 
following it is assumed that these variables have al- 
ready been defined computationally by the user with 
c a real/double, and k and n integers. 

In the R statistical package: 
p_lower <- qbeta((l-c)/2,k+l,n-k+l) 
p.upper <- qbeta(l-(l-c)/2,k+l,n-k+l) 

In MATLAB: 
p_lower = betaincinv((l-c)/2,k+l,n-k+l) 
p_upper = betaincinv(l-(l-c)/2, k+1, n-k+1) 

In MATHEMATICA: 
plower = 

Quantile [BetaDistribution [k+1 ,n-k+l] , (l-c)/2] 
pupper = 

Quantile [BetaDistribution [k+1 ,n-k+l] , l-(l-c)/2] 

In IDL (if an 'IDL Analyst' license is available): 
p_lower = 

IMSL_BETACDF( (l-c)/2 ,k+l ,n-k+l , /INVERSE) 
p_upper = 

IMSL_BETACDF ( 1- ( 1-c) /2 , k+1 , n-k+1 , /INVERSE) 

otherwise, iteratively: 

z = FINDGEN( 10000) *0. 0001 

Beta = IBETA(k+l, n-k+1, z) 

il = VALUE_L0CATE(Beta, (l-c)/2) 

iu = VALUE_L0CATE(Beta,l-(l-c)/2) 

p_lower = z [il] 

p_upper = z [ul] 

In PYTHON: 

import scipy . stats . distributions as dist 



p_lower = dist .beta. ppf ( (l-c)/2 ., k+1 , n-k+1) 
p_upper = dist .beta. ppf (1- (l-c)/2 ., k+1 , n-k+1) 
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Table 1: Confidence interval estimates at c « 0.683 (la) on binomial population proportions from quantiles of the beta distribution for all possible 
observed success counts for sample sizes up to 20 
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Table 2: Confidence interval estimates at c « 0.997 (3a) on binomial population proportions from quantiles of the beta distribution for all possible 
observed success counts for sample sizes up to 20 
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